{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To ensure high quality in analytical modeling or analysis, data must be validated and cleansed. In our scenerio, we are given two sets of similar room type, one is sourced from Expedia, another is sourced from Booking.com. We will normalize both sets to have a common record. Fuzzy matching is a technique that I am using. It works with matches that may be less than 100% perfect. Fuzzy matching is blind to obvious synonyms.\n",
    "\n",
    "In this exercise, I take room type from Expedia, compare and match it's associated room type in Booking.com. In another words, we match records between two data sources.\n",
    "\n",
    "I have defined a match as something more like “a human with some experiences would have guessed these rooms were the same thing”. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv('room_type.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Expedia</th>\n",
       "      <th>Booking.com</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Deluxe Room, 1 King Bed</td>\n",
       "      <td>Deluxe King Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Standard Room, 1 King Bed, Accessible</td>\n",
       "      <td>Standard King Roll-in Shower Accessible</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Grand Corner King Room, 1 King Bed</td>\n",
       "      <td>Grand Corner King Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Suite, 1 King Bed (Parlor)</td>\n",
       "      <td>King Parlor Suite</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>High-Floor Premium Room, 1 King Bed</td>\n",
       "      <td>High-Floor Premium King Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Traditional Double Room, 2 Double Beds</td>\n",
       "      <td>Double Room with Two Double Beds</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Room, 1 King Bed, Accessible</td>\n",
       "      <td>King Room - Disability Access</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Deluxe Room, 1 King Bed</td>\n",
       "      <td>Deluxe King Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Deluxe Room</td>\n",
       "      <td>Deluxe Room (Non Refundable)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Room, 2 Double Beds (19th to 25th Floors)</td>\n",
       "      <td>Two Double Beds - Location Room (19th to 25th ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                     Expedia  \\\n",
       "0                    Deluxe Room, 1 King Bed   \n",
       "1      Standard Room, 1 King Bed, Accessible   \n",
       "2         Grand Corner King Room, 1 King Bed   \n",
       "3                 Suite, 1 King Bed (Parlor)   \n",
       "4        High-Floor Premium Room, 1 King Bed   \n",
       "5     Traditional Double Room, 2 Double Beds   \n",
       "6               Room, 1 King Bed, Accessible   \n",
       "7                    Deluxe Room, 1 King Bed   \n",
       "8                                Deluxe Room   \n",
       "9  Room, 2 Double Beds (19th to 25th Floors)   \n",
       "\n",
       "                                         Booking.com  \n",
       "0                                   Deluxe King Room  \n",
       "1            Standard King Roll-in Shower Accessible  \n",
       "2                             Grand Corner King Room  \n",
       "3                                  King Parlor Suite  \n",
       "4                       High-Floor Premium King Room  \n",
       "5                   Double Room with Two Double Beds  \n",
       "6                      King Room - Disability Access  \n",
       "7                                   Deluxe King Room  \n",
       "8                       Deluxe Room (Non Refundable)  \n",
       "9  Two Double Beds - Location Room (19th to 25th ...  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I created the data set, so, it is very clean"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### FuzzyWuzzy\n",
    "\n",
    "Let's give a try, compare and match three pairs of the data.\n",
    "\n",
    "1). Ratio, - Compares the entire string similarity, in order."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "62"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from fuzzywuzzy import fuzz\n",
    "fuzz.ratio('Deluxe Room, 1 King Bed', 'Deluxe King Room')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is telling us that the \"Deluxe Room, 1 King Bed\" and \"Deluxe King Room\" pair are about 62% the same."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "69"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fuzz.ratio('Traditional Double Room, 2 Double Beds', 'Double Room with Two Double Beds')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The \"Traditional Double Room, 2 Double Beds\" and \"Double Room with Two Double Beds\" pair are about 69% the same."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "74"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fuzz.ratio('Room, 2 Double Beds (19th to 25th Floors)', 'Two Double Beds - Location Room (19th to 25th Floors)')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The \"Room, 2 Double Beds (19th to 25th Floors)\" and \"Two Double Beds - Location Room (19th to 25th Floors)\" pair are about 74% the same."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I am a little disappointed with these. It turns out, the naive approach is far too sensitive to minor differences in word order, missing or extra words, and other such issues.\n",
    "\n",
    "2). Partial ratio, - Compares partial string similarity.\n",
    "\n",
    "We are still using the same data pairs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "69"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fuzz.partial_ratio('Deluxe Room, 1 King Bed', 'Deluxe King Room')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "83"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fuzz.partial_ratio('Traditional Double Room, 2 Double Beds', 'Double Room with Two Double Beds')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "63"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fuzz.partial_ratio('Room, 2 Double Beds (19th to 25th Floors)', 'Two Double Beds - Location Room (19th to 25th Floors)')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Comparing partial string brings a little better results for some pairs.\n",
    "\n",
    "3). Token sort ratio, - Ignores word order."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "84"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fuzz.token_sort_ratio('Deluxe Room, 1 King Bed', 'Deluxe King Room')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "78"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fuzz.token_sort_ratio('Traditional Double Room, 2 Double Beds', 'Double Room with Two Double Beds')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "83"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fuzz.token_sort_ratio('Room, 2 Double Beds (19th to 25th Floors)', 'Two Double Beds - Location Room (19th to 25th Floors)')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The best so far.\n",
    "\n",
    "4). Token set ratio, - Ignores duplicated words. It is similar with token sort ratio, but a little bit more flexible."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "100"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fuzz.token_set_ratio('Deluxe Room, 1 King Bed', 'Deluxe King Room')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "78"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fuzz.token_set_ratio('Traditional Double Room, 2 Double Beds', 'Double Room with Two Double Beds')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "97"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fuzz.token_set_ratio('Room, 2 Double Beds (19th to 25th Floors)', 'Two Double Beds - Location Room (19th to 25th Floors)')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Seems token set ratio is the best fit for my data. According to this discovery, I decided to apply token set ratio to my entire data set."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When setting ratio > 70."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Expedia</th>\n",
       "      <th>Booking.com</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Deluxe Room, 1 King Bed</td>\n",
       "      <td>Deluxe King Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Standard Room, 1 King Bed, Accessible</td>\n",
       "      <td>Standard King Roll-in Shower Accessible</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Grand Corner King Room, 1 King Bed</td>\n",
       "      <td>Grand Corner King Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Suite, 1 King Bed (Parlor)</td>\n",
       "      <td>King Parlor Suite</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>High-Floor Premium Room, 1 King Bed</td>\n",
       "      <td>High-Floor Premium King Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Traditional Double Room, 2 Double Beds</td>\n",
       "      <td>Double Room with Two Double Beds</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Room, 1 King Bed, Accessible</td>\n",
       "      <td>King Room - Disability Access</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Deluxe Room, 1 King Bed</td>\n",
       "      <td>Deluxe King Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Deluxe Room</td>\n",
       "      <td>Deluxe Room (Non Refundable)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Room, 2 Double Beds (19th to 25th Floors)</td>\n",
       "      <td>Two Double Beds - Location Room (19th to 25th ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                     Expedia  \\\n",
       "0                    Deluxe Room, 1 King Bed   \n",
       "1      Standard Room, 1 King Bed, Accessible   \n",
       "2         Grand Corner King Room, 1 King Bed   \n",
       "3                 Suite, 1 King Bed (Parlor)   \n",
       "4        High-Floor Premium Room, 1 King Bed   \n",
       "5     Traditional Double Room, 2 Double Beds   \n",
       "6               Room, 1 King Bed, Accessible   \n",
       "7                    Deluxe Room, 1 King Bed   \n",
       "8                                Deluxe Room   \n",
       "9  Room, 2 Double Beds (19th to 25th Floors)   \n",
       "\n",
       "                                         Booking.com  \n",
       "0                                   Deluxe King Room  \n",
       "1            Standard King Roll-in Shower Accessible  \n",
       "2                             Grand Corner King Room  \n",
       "3                                  King Parlor Suite  \n",
       "4                       High-Floor Premium King Room  \n",
       "5                   Double Room with Two Double Beds  \n",
       "6                      King Room - Disability Access  \n",
       "7                                   Deluxe King Room  \n",
       "8                       Deluxe Room (Non Refundable)  \n",
       "9  Two Double Beds - Location Room (19th to 25th ...  "
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def get_ratio(row):\n",
    "    name = row['Expedia']\n",
    "    name1 = row['Booking.com']\n",
    "    return fuzz.token_set_ratio(name, name1)\n",
    "\n",
    "df[df.apply(get_ratio, axis=1) > 70].head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9029126213592233"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df[df.apply(get_ratio, axis=1) > 70]) / len(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Over 90% of the pairs exceed a match score of 70."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Expedia</th>\n",
       "      <th>Booking.com</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Deluxe Room, 1 King Bed</td>\n",
       "      <td>Deluxe King Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Standard Room, 1 King Bed, Accessible</td>\n",
       "      <td>Standard King Roll-in Shower Accessible</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Grand Corner King Room, 1 King Bed</td>\n",
       "      <td>Grand Corner King Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Suite, 1 King Bed (Parlor)</td>\n",
       "      <td>King Parlor Suite</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>High-Floor Premium Room, 1 King Bed</td>\n",
       "      <td>High-Floor Premium King Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Traditional Double Room, 2 Double Beds</td>\n",
       "      <td>Double Room with Two Double Beds</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Room, 1 King Bed, Accessible</td>\n",
       "      <td>King Room - Disability Access</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Deluxe Room, 1 King Bed</td>\n",
       "      <td>Deluxe King Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Deluxe Room</td>\n",
       "      <td>Deluxe Room (Non Refundable)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Room, 2 Double Beds (19th to 25th Floors)</td>\n",
       "      <td>Two Double Beds - Location Room (19th to 25th ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Room, 1 King Bed (19th to 25 Floors)</td>\n",
       "      <td>King Bed - Location Room (19th to 25th Floors)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Deluxe Room</td>\n",
       "      <td>Deluxe Double Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Junior Suite, 1 King Bed with Sofa Bed</td>\n",
       "      <td>Junior Suite</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Signature Room, 2 Queen Beds</td>\n",
       "      <td>Signature Two Queen</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Signature Room, 1 King Bed</td>\n",
       "      <td>Signature One King</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Premium Room, 2 Queen Beds</td>\n",
       "      <td>Premium Two Queen</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Studio, 1 King Bed with Sofa bed, Corner</td>\n",
       "      <td>Corner King Studio</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Club Room, 2 Queen Beds</td>\n",
       "      <td>Club Two Queen</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Club Room, 1 King Bed</td>\n",
       "      <td>Club One King</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Club Room, Premium 2 Queen Beds</td>\n",
       "      <td>Club Premium Two Queen</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Suite, 1 Bedroom</td>\n",
       "      <td>One Bedroom Suite</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Deluxe Room, City View</td>\n",
       "      <td>Deluxe King Or Queen Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Deluxe Room, Lake View</td>\n",
       "      <td>Deluxe King Or Queen Room with Lake View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>Club Room, City View (Club Lounge Access for 2...</td>\n",
       "      <td>Club Level King Or Queen Room with City View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Deluxe Room, 1 King Bed</td>\n",
       "      <td>Deluxe Room - One King Bed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Deluxe Room, 2 Queen Beds</td>\n",
       "      <td>Deluxe Room - Two Queen Beds</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Premier Room, 1 King Bed (Royal Club)</td>\n",
       "      <td>Royal Club Premier Room - One King Bed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>Room, 2 Double Beds, Non Smoking</td>\n",
       "      <td>Double Room with Two Double Beds</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>Room, 1 King Bed, Non Smoking (LEISURE)</td>\n",
       "      <td>King Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>Executive Room, 1 King Bed, Non Smoking</td>\n",
       "      <td>Executive King Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72</th>\n",
       "      <td>Standard Room, Lagoon View</td>\n",
       "      <td>Standard Room Dolphin Lagoon View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>73</th>\n",
       "      <td>Standard Room, Ocean View</td>\n",
       "      <td>Standard Room With Ocean View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74</th>\n",
       "      <td>Standard Room, Oceanfront</td>\n",
       "      <td>Standard Room With Ocean Front View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75</th>\n",
       "      <td>City Room, City View</td>\n",
       "      <td>Room With City View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>76</th>\n",
       "      <td>Room, Partial Ocean View</td>\n",
       "      <td>Partical Ocean View Room</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>77</th>\n",
       "      <td>Deluxe Room, Corner</td>\n",
       "      <td>Corner Deluxe Studio</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>78</th>\n",
       "      <td>Room, Ocean View</td>\n",
       "      <td>Room With Ocean View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>Room, 1 King Bed, City View</td>\n",
       "      <td>City View With One King Bed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>80</th>\n",
       "      <td>Room, 2 Double Beds, City View</td>\n",
       "      <td>City View With Two Double Beds</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>81</th>\n",
       "      <td>Room, 2 Double Beds, Partial Ocean View</td>\n",
       "      <td>Partial Ocean View With Two Double Beds</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82</th>\n",
       "      <td>Room, 2 Double Beds, Accessible, Partial Ocean...</td>\n",
       "      <td>Accessible Partial Ocean View With Two Double ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>83</th>\n",
       "      <td>Room, 2 Double Beds, Ocean View (Diamond Head)</td>\n",
       "      <td>Club Diamond Head Ocean View With Two Double B...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>84</th>\n",
       "      <td>Club Room, 1 King Bed, Oceanfront</td>\n",
       "      <td>Club Ocean Front With One King Bed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>Club Suite, 1 King Bed, Accessible, Ocean View</td>\n",
       "      <td>Accessible Club Ocean View Suite With One King...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>86</th>\n",
       "      <td>Club Suite, 1 King Bed, Oceanfront (No Resort ...</td>\n",
       "      <td>Club Ocean Front Suite King Bed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>87</th>\n",
       "      <td>Standard Room, Mountain View (City View - Kona...</td>\n",
       "      <td>Kona Tower City / Mountain View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88</th>\n",
       "      <td>Standard Room, Partial Ocean View (Kona Tower)...</td>\n",
       "      <td>Kona Tower Partial Ocean View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>89</th>\n",
       "      <td>Standard Room, Partial Ocean View (Waikiki Tow...</td>\n",
       "      <td>Waikiki Tower Partial Ocean View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90</th>\n",
       "      <td>Standard Room, Ocean View (Waikiki Tower) - No...</td>\n",
       "      <td>Waikiki Tower Ocean View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>91</th>\n",
       "      <td>Room, 2 Queen Beds (Waikiki View)</td>\n",
       "      <td>Queen Room With Two Queen Beds and Waikiki View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>92</th>\n",
       "      <td>Room, Ocean View</td>\n",
       "      <td>King Or Two Queen Room With Ocean View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>94</th>\n",
       "      <td>Regency Club, Mountain View</td>\n",
       "      <td>Regency Club Mountain View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95</th>\n",
       "      <td>Regency Club, Ocean View</td>\n",
       "      <td>Regency Club Ocean View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>96</th>\n",
       "      <td>Room, 1 King Bed, Ocean View</td>\n",
       "      <td>Ocean View Room With King Bed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>97</th>\n",
       "      <td>Room, 1 King Bed, Resort View (Alii)</td>\n",
       "      <td>Alii Tower Resort View With King Bed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>98</th>\n",
       "      <td>Room, 1 King Bed, Accessible, Resort View (Ali...</td>\n",
       "      <td>Alii Tower Resort View With King Bed - Mobilit...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99</th>\n",
       "      <td>Room, 1 King Bed, Accessible, View (Rainbow, B...</td>\n",
       "      <td>Rainbow Tower Ocean View With King Bed - Mobil...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100</th>\n",
       "      <td>Room, 1 King Bed, Ocean View (Alii)</td>\n",
       "      <td>Alii Tower Ocean View With King Bed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>101</th>\n",
       "      <td>Room, 1 King Bed, Oceanfront (Rainbow)</td>\n",
       "      <td>Rainbow Tower Ocean Front with King Bed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>102</th>\n",
       "      <td>Junior Suite, 1 King Bed, Accessible (Roll-in ...</td>\n",
       "      <td>Junior Suite - Accessible Roll-in Shower</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>101 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               Expedia  \\\n",
       "0                              Deluxe Room, 1 King Bed   \n",
       "1                Standard Room, 1 King Bed, Accessible   \n",
       "2                   Grand Corner King Room, 1 King Bed   \n",
       "3                           Suite, 1 King Bed (Parlor)   \n",
       "4                  High-Floor Premium Room, 1 King Bed   \n",
       "5               Traditional Double Room, 2 Double Beds   \n",
       "6                         Room, 1 King Bed, Accessible   \n",
       "7                              Deluxe Room, 1 King Bed   \n",
       "8                                          Deluxe Room   \n",
       "9            Room, 2 Double Beds (19th to 25th Floors)   \n",
       "10                Room, 1 King Bed (19th to 25 Floors)   \n",
       "11                                         Deluxe Room   \n",
       "12              Junior Suite, 1 King Bed with Sofa Bed   \n",
       "13                        Signature Room, 2 Queen Beds   \n",
       "14                          Signature Room, 1 King Bed   \n",
       "15                          Premium Room, 2 Queen Beds   \n",
       "16            Studio, 1 King Bed with Sofa bed, Corner   \n",
       "17                             Club Room, 2 Queen Beds   \n",
       "18                               Club Room, 1 King Bed   \n",
       "19                     Club Room, Premium 2 Queen Beds   \n",
       "20                                    Suite, 1 Bedroom   \n",
       "21                              Deluxe Room, City View   \n",
       "22                              Deluxe Room, Lake View   \n",
       "23   Club Room, City View (Club Lounge Access for 2...   \n",
       "25                             Deluxe Room, 1 King Bed   \n",
       "26                           Deluxe Room, 2 Queen Beds   \n",
       "27               Premier Room, 1 King Bed (Royal Club)   \n",
       "28                    Room, 2 Double Beds, Non Smoking   \n",
       "29             Room, 1 King Bed, Non Smoking (LEISURE)   \n",
       "30             Executive Room, 1 King Bed, Non Smoking   \n",
       "..                                                 ...   \n",
       "72                          Standard Room, Lagoon View   \n",
       "73                           Standard Room, Ocean View   \n",
       "74                           Standard Room, Oceanfront   \n",
       "75                                City Room, City View   \n",
       "76                            Room, Partial Ocean View   \n",
       "77                                 Deluxe Room, Corner   \n",
       "78                                   Room, Ocean View    \n",
       "79                         Room, 1 King Bed, City View   \n",
       "80                      Room, 2 Double Beds, City View   \n",
       "81             Room, 2 Double Beds, Partial Ocean View   \n",
       "82   Room, 2 Double Beds, Accessible, Partial Ocean...   \n",
       "83      Room, 2 Double Beds, Ocean View (Diamond Head)   \n",
       "84                   Club Room, 1 King Bed, Oceanfront   \n",
       "85      Club Suite, 1 King Bed, Accessible, Ocean View   \n",
       "86   Club Suite, 1 King Bed, Oceanfront (No Resort ...   \n",
       "87   Standard Room, Mountain View (City View - Kona...   \n",
       "88   Standard Room, Partial Ocean View (Kona Tower)...   \n",
       "89   Standard Room, Partial Ocean View (Waikiki Tow...   \n",
       "90   Standard Room, Ocean View (Waikiki Tower) - No...   \n",
       "91                   Room, 2 Queen Beds (Waikiki View)   \n",
       "92                                    Room, Ocean View   \n",
       "94                         Regency Club, Mountain View   \n",
       "95                            Regency Club, Ocean View   \n",
       "96                        Room, 1 King Bed, Ocean View   \n",
       "97                Room, 1 King Bed, Resort View (Alii)   \n",
       "98   Room, 1 King Bed, Accessible, Resort View (Ali...   \n",
       "99   Room, 1 King Bed, Accessible, View (Rainbow, B...   \n",
       "100                Room, 1 King Bed, Ocean View (Alii)   \n",
       "101             Room, 1 King Bed, Oceanfront (Rainbow)   \n",
       "102  Junior Suite, 1 King Bed, Accessible (Roll-in ...   \n",
       "\n",
       "                                           Booking.com  \n",
       "0                                     Deluxe King Room  \n",
       "1              Standard King Roll-in Shower Accessible  \n",
       "2                               Grand Corner King Room  \n",
       "3                                    King Parlor Suite  \n",
       "4                         High-Floor Premium King Room  \n",
       "5                     Double Room with Two Double Beds  \n",
       "6                        King Room - Disability Access  \n",
       "7                                     Deluxe King Room  \n",
       "8                         Deluxe Room (Non Refundable)  \n",
       "9    Two Double Beds - Location Room (19th to 25th ...  \n",
       "10      King Bed - Location Room (19th to 25th Floors)  \n",
       "11                                  Deluxe Double Room  \n",
       "12                                        Junior Suite  \n",
       "13                                 Signature Two Queen  \n",
       "14                                  Signature One King  \n",
       "15                                   Premium Two Queen  \n",
       "16                                  Corner King Studio  \n",
       "17                                      Club Two Queen  \n",
       "18                                       Club One King  \n",
       "19                              Club Premium Two Queen  \n",
       "20                                   One Bedroom Suite  \n",
       "21                           Deluxe King Or Queen Room  \n",
       "22            Deluxe King Or Queen Room with Lake View  \n",
       "23        Club Level King Or Queen Room with City View  \n",
       "25                          Deluxe Room - One King Bed  \n",
       "26                        Deluxe Room - Two Queen Beds  \n",
       "27              Royal Club Premier Room - One King Bed  \n",
       "28                    Double Room with Two Double Beds  \n",
       "29                                           King Room  \n",
       "30                                 Executive King Room  \n",
       "..                                                 ...  \n",
       "72                   Standard Room Dolphin Lagoon View  \n",
       "73                       Standard Room With Ocean View  \n",
       "74                 Standard Room With Ocean Front View  \n",
       "75                                 Room With City View  \n",
       "76                            Partical Ocean View Room  \n",
       "77                                Corner Deluxe Studio  \n",
       "78                                Room With Ocean View  \n",
       "79                         City View With One King Bed  \n",
       "80                      City View With Two Double Beds  \n",
       "81             Partial Ocean View With Two Double Beds  \n",
       "82   Accessible Partial Ocean View With Two Double ...  \n",
       "83   Club Diamond Head Ocean View With Two Double B...  \n",
       "84                  Club Ocean Front With One King Bed  \n",
       "85   Accessible Club Ocean View Suite With One King...  \n",
       "86                     Club Ocean Front Suite King Bed  \n",
       "87                     Kona Tower City / Mountain View  \n",
       "88                       Kona Tower Partial Ocean View  \n",
       "89                    Waikiki Tower Partial Ocean View  \n",
       "90                            Waikiki Tower Ocean View  \n",
       "91     Queen Room With Two Queen Beds and Waikiki View  \n",
       "92              King Or Two Queen Room With Ocean View  \n",
       "94                          Regency Club Mountain View  \n",
       "95                             Regency Club Ocean View  \n",
       "96                       Ocean View Room With King Bed  \n",
       "97                Alii Tower Resort View With King Bed  \n",
       "98   Alii Tower Resort View With King Bed - Mobilit...  \n",
       "99   Rainbow Tower Ocean View With King Bed - Mobil...  \n",
       "100                Alii Tower Ocean View With King Bed  \n",
       "101            Rainbow Tower Ocean Front with King Bed  \n",
       "102           Junior Suite - Accessible Roll-in Shower  \n",
       "\n",
       "[101 rows x 2 columns]"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df.apply(lambda row: fuzz.token_set_ratio(row['Expedia'], row['Booking.com']), axis=1) > 60]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9805825242718447"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df[df.apply(lambda row: fuzz.token_set_ratio(row['Expedia'], row['Booking.com']), axis=1) > 60]) / len(df)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
